Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PosixFileStream for files on POSIX #1855

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

BCSharp
Copy link
Member

@BCSharp BCSharp commented Dec 30, 2024

This PR replaces FileStream with PosixFileStream on .NET/POSIX (not Mono), which offers unbuffered access to the file though its actual file descriptor. This addresses the issue #1846 on .NET/POSIX.

I decided to write own implementation of the stream, rather than encapsulating UnixStream. Not only the implementation had similar amount of work involved and complexity as a proxy class to UnixStream, but also it allowed me to make different implementation choices than UnixStream, some of which I would call problematic. The best example is UnixStream.Close(), which retries syscall to close if interrupted. This will result in error EBADF at best, and close an unrelated file descriptor at worst. It's because, on most systems (including Linux and macOS), close always closes the descriptor, even it the call fails with an error, and retry is not appropriate. See CPython source code.

Another effect of this PR is that for .NET/POSIX, there are no more "double streams", meaning that operations on file descriptors behave as expected in all cases (modulo bugs and missing pieces).

The performance of this implementation is roughly the same as CPython's. When I first tested it, I shockingly discovered that it was 2.5 times slower, but then noticed that StreamBox always calls Flush after each write and that was slowing things down significantly. I included a hack to avoid it for PosixFileStream and it helped.

Module mmap is due for some cleanup, which is outside of the scope of this PR.

Unfortunately I had to leave Mono behind, that is, on FileStream. Between the limitations/bugs of MemoryMappedFile interface and FileStream, using PosixFileStream on Mono (or even UnixStream) would create a regression in mmap.

This PR also makes sure that all file descriptors created via open or io.open on .NET/POSIX are non-inheritable, in line with #1225.


// a hack but it does improve perfomance a lot if flushing is not needed
_writeNeedsFlush = writeStream switch {
FileStream => true, // FileStream is buffered by default, used on Windows and mostly Mono
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory the FileStream should already be unbuffered since the FileIO.OpenFile helper uses a buffer size of 1. I'm not sure if it's true in practice, but it looks like at least .NET (Core) has code to handle it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked the source code of .NET Framework 4.8 and it also performs unbuffered writes when buffer size is 1. I suspect it is the same on 4.6.2. On Mono, however, buffer bypass only happens if the data length is > buffer size, meaning that single-byte writes still land in the 1-byte buffer, not on the disk. We could then optimize flushing by only flushing if IsMono and data length is 1.

But FileIO.OpenFile is not the only way of creating FileIO. One can use os.open to get a file descriptor and then open a file. Currently, PythonNT.open, when creating FileStream, uses buffer of 4K (DefaultBufferSize), which is defined since the initial commit 11 years ago. I will change it to 1 to get the consistent unbuffered behaviour, unless you have counter arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also open with a buffer size of 0 if that would help with Mono...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Mono, .NET, and .NET Framework all throw for buffer size <= 0.

Copy link
Contributor

@slozier slozier Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, in the FileStreamOptions.BufferSize docs it says 0 or 1 to disable buffering. I was assuming it'd simply pass along the value to the other constructor. Anyway, it looks like .NET Framework does indeed throw for <= 0 so I guess it doesn't help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, .NET doesn't throw with 0, I was speaking from memory, must have gotten confused with another framework. Or maybe it was throwing in the past (older framework versions).

Anyway, the issue is with Mono, and it does throw at 0, and it does buffer at 1.

_writeNeedsFlush = writeStream switch {
FileStream => true, // FileStream is buffered by default, used on Windows and mostly Mono
PosixFileStream => false, // PosixFileStream is unbuffered, used on .NET/Posix
UnixStream => false, // UnixStream is unbuffered, used sometimes on Mono
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is causing the test failures since it'll try to load the Mono.Posix assembly (which isn't packaged along with Windows builds).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it makes sense... The failure on Linux is a bit more interesting as it is failing when dealing with a file opened in the "a" mode for simultaneous logging.

@BCSharp BCSharp marked this pull request as draft January 1, 2025 00:27
Copy link
Contributor

@slozier slozier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you changed it to draft. Did you have additional changes you wanted to do?

@@ -15,6 +15,8 @@
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Serialization;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Runtime.Serialization is VS trying to be helpful?

@BCSharp
Copy link
Member Author

BCSharp commented Jan 4, 2025

mmap needs more work. Coming soon.

@BCSharp
Copy link
Member Author

BCSharp commented Jan 6, 2025

It was a non-trivial merge. mmap is still not ready (e.g. resize and file descriptor life cycle is incorrect), and added tests should expose some problems. @slozier, I see you are also currently working on mmap, I hope we won't step on each other toes 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants